U Subtle motifs: defining the limits of motif finding algorithms

نویسندگان

  • Uri Keich
  • Pavel A. Pevzner
چکیده

MOTIVATION What constitutes a subtle motif? Intuitively, it is a motif that is almost indistinguishable, in the statistical sense, from random motifs. This question has important practical consequences: consider, for example, a biologist that is generating a sample of upstream regulatory sequences with the goal of finding a regulatory pattern that is shared by these sequences. If the sequences are too short then one risks losing some of the regulatory patterns that are located further upstream. Conversely, if the sequences are too long, the motif becomes too subtle and one is then likely to encounter random motifs which are at least as significant statistically as the regulatory pattern itself. In practical terms one would like to recognize the sequence length threshold, or the twilight zone, beyond which the motifs are in some sense too subtle. RESULTS The paper defines the motif twilight zone where every motif finding algorithm would be exposed to random motifs which are as significant as the one which is sought. We also propose an objective tool for evaluating the performance of subtle motif finding algorithms. Finally we apply these tools to evaluate the success of our MULTIPROFILER algorithm to detect subtle motifs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding subtle motifs by branching from sample strings

UNLABELLED Many motif finding algorithms apply local search techniques to a set of seeds. For example, GibbsDNA (Lawrence et al. 1993, Science, 262, 208-214) applies Gibbs sampling to random seeds, and MEME (Bailey and Elkan, 1994, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (ISMB-94), 28-36) applies the EM algorithm to selected sample strings...

متن کامل

HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences

MOTIVATION Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local ...

متن کامل

Finding subtle motifs with variable gaps in unaligned DNA sequences

Biologists have determined that the control and regulation of gene expression is primarily determined by relatively short sequences in the region surrounding a gene. These sequences vary in length, position, redundancy, orientation, and bases. Finding these short sequences is a fundamental problem in molecular biology with important applications. Though there exist many different approaches to ...

متن کامل

Efficient Algorithms for Model-Based Motif Discovery from Multiple Sequences

We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g1g2 . . . gm is a string of m characters. Each background sequence is implanted a randomly generated approx...

متن کامل

مطالعه نقش‌مایه گلدانی در قالی‌های خشتی روستایی چهارمحال و بختیاری (با تأکید بر مناطق چالشتر، شلمزار و بلداجی)

 The ancient motif of vase, whether individually or in combination with other motifs, is one of the most basic patterning motifs in visual culture of Iran and is among the manifestations of eternal verdancy. Multiplicity of its types, its various forms, flexibility of its structure and its combination with other motifs have caused that the vase motif become a basic and guiding elements for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 18 10  شماره 

صفحات  -

تاریخ انتشار 2002